Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
An evolving solution to address hallucination and enhance accuracy in large language models (LLMs) is Retrieval-Augmented Generation (RAG), which involves augmenting LLMs with information retrieved from an external knowledge source, such as the web. This paper profiles several RAG execution pipelines and demystifies the complex interplay between their retrieval and generation phases. We demonstrate that while exact retrieval schemes are expensive, they can reduce inference time compared to approximate retrieval variants because an exact retrieval model can send a smaller but more accurate list of documents to the generative model while maintaining the same end-to-end accuracy. This observation motivates the acceleration of the exact nearest neighbor search for RAG. In this work, we design Intelligent Knowledge Store (IKS), a type-2 CXL device that implements a scale-out near-memory acceleration architecture with a novel cache-coherent interface between the host CPU and near-memory accelerators. IKS offers 13.4--27.9× faster exact nearest neighbor search over a 512GB vector database compared with executing the search on Intel Sapphire Rapids CPUs. This higher search performance translates to 1.7--26.3× lower end-to-end inference time for representative RAG applications. IKS is inherently a memory expander; its internal DRAM can be disaggregated and used for other applications running on the server to prevent DRAM -- which is the most expensive component in today's servers -- from being stranded.more » « lessFree, publicly-accessible full text available March 30, 2026
-
Free, publicly-accessible full text available February 1, 2026
-
In this paper, we study the dynamics of fluids in porous media governed by Darcy’s law: the Muskat problem. We consider the setting of two immiscible fluids of different densities and viscosities under the influence of gravity in which one fluid is completely surrounded by the other. This setting is gravity unstable because along a portion of the interface, the denser fluid must be above the other. Surprisingly, even without capillarity, the circle-shaped bubble is a steady state solution moving with vertical constant velocity determined by the density jump between the fluids. Taking advantage of our discovery of this steady state, we are able to prove global in time existence and uniqueness of dynamic bubbles of nearly circular shapes under the influence of surface tension. We prove this global existence result for low regularity initial data. Moreover, we prove that these solutions are instantly analytic and decay exponentially fast in time to the circle.more » « less
-
Etessami, Kousha; Feige, Uriel; Puppis, Gabriele (Ed.)Motivated by recent progress on stochastic matching with few queries, we embark on a systematic study of the sparsification of stochastic packing problems more generally. Specifically, we consider packing problems where elements are independently active with a given probability p, and ask whether one can (non-adaptively) compute a "sparse" set of elements guaranteed to contain an approximately optimal solution to the realized (active) subproblem. We seek structural and algorithmic results of broad applicability to such problems. Our focus is on computing sparse sets containing on the order of d feasible solutions to the packing problem, where d is linear or at most polynomial in 1/p. Crucially, we require d to be independent of the number of elements, or any parameter related to the "size" of the packing problem. We refer to d as the "degree" of the sparsifier, as is consistent with graph theoretic degree in the special case of matching. First, we exhibit a generic sparsifier of degree 1/p based on contention resolution. This sparsifier’s approximation ratio matches the best contention resolution scheme (CRS) for any packing problem for additive objectives, and approximately matches the best monotone CRS for submodular objectives. Second, we embark on outperforming this generic sparsifier for additive optimization over matroids and their intersections, as well as weighted matching. These improved sparsifiers feature different algorithmic and analytic approaches, and have degree linear in 1/p. In the case of a single matroid, our sparsifier tends to the optimal solution. In the case of weighted matching, we combine our contention-resolution-based sparsifier with technical approaches of prior work to improve the state of the art ratio from 0.501 to 0.536. Third, we examine packing problems with submodular objectives. We show that even the simplest such problems do not admit sparsifiers approaching optimality. We then outperform our generic sparsifier for some special cases with submodular objectives.more » « less
-
In this work, we set out to find the answers to the following questions: (1) Where are the bottlenecks in a state-of-theart architectural simulator? (2) How much faster can architectural simulations run by tuning system configurations? (3) What are the opportunities in accelerating software simulation using hardware accelerators? We choose gem5 as the representative architectural simulator, run several simulations with various configurations, perform a detailed architectural analysis of the gem5 source code on different server platforms, tune both system and architectural settings for running simulations, and discuss the future opportunities in accelerating gem5 as an important application. Our detailed profiling of gem5 reveals that its performance is extremely sensitive to the size of the Ll cache. Our experimental results show that a RISC-V core with 32KB data and instruction cache improves gem5’s simulation speed by 31%-61% compared with a baseline core with 8KB Ll caches. Our paper is the first step toward building specialized hardware and software environments for accelerating software-based simulators.more » « less
An official website of the United States government

Full Text Available